Goto

Collaborating Authors

 deep learner


Generative Modeling: A Review

Polson, Nick, Sokolov, Vadim

arXiv.org Artificial Intelligence

Generative methods (Gen-AI) are reviewed with a particular goal to solving tasks in Machine Learning and Bayesian inference. Generative models require one to simulate a large training dataset and to use deep neural networks to solve a supervised learning problem. To do this, we require high dimensional regression methods and tools for dimensionality reduction (a.k.a feature selection). The main advantage of Gen-AI methods is their ability to be model-free and to use deep neural networks to estimate conditional densities or posterior quantiles of interest. To illustrate generative methods, we analyze the well-known Ebola data-set. Finally, we conclude with directions for future research.


Estimating the treatment effect over time under general interference through deep learner integrated TMLE

Guo, Suhan, Shen, Furao, Li, Ni

arXiv.org Artificial Intelligence

Understanding the effects of quarantine policies in populations with underlying social networks is crucial for public health, yet most causal inference methods fail here due to their assumption of independent individuals. We introduce DeepNetTMLE, a deep-learning-enhanced Targeted Maximum Likelihood Estimation (TMLE) method designed to estimate time-sensitive treatment effects in observational data. DeepNetTMLE mitigates bias from time-varying confounders under general interference by incorporating a temporal module and domain adversarial training to build intervention-invariant representations. This process removes associations between current treatments and historical variables, while the targeting step maintains the bias-variance trade-off, enhancing the reliability of counterfactual predictions. Using simulations of a ``Susceptible-Infected-Recovered'' model with varied quarantine coverages, we show that DeepNetTMLE achieves lower bias and more precise confidence intervals in counterfactual estimates, enabling optimal quarantine recommendations within budget constraints, surpassing state-of-the-art methods.


Generative Bayesian Computation for Maximum Expected Utility

Polson, Nick, Ruggeri, Fabrizio, Sokolov, Vadim

arXiv.org Machine Learning

Generative Bayesian Computation (GBC) methods are developed to provide an efficient computational solution for maximum expected utility (MEU). We propose a density-free generative method based on quantiles that naturally calculates expected utility as a marginal of quantiles. Our approach uses a deep quantile neural estimator to directly estimate distributional utilities. Generative methods assume only the ability to simulate from the model and parameters and as such are likelihood-free. A large training dataset is generated from parameters and output together with a base distribution. Our method a number of computational advantages primarily being density-free with an efficient estimator of expected utility. A link with the dual theory of expected utility and risk taking is also discussed. To illustrate our methodology, we solve an optimal portfolio allocation problem with Bayesian learning and a power utility (a.k.a. fractional Kelly criterion). Finally, we conclude with directions for future research.


Merging Two Cultures: Deep and Statistical Learning

Bhadra, Anindya, Datta, Jyotishka, Polson, Nick, Sokolov, Vadim, Xu, Jianeng

arXiv.org Machine Learning

Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.


Discussion of Ensemble Learning under the Era of Deep Learning

Yang, Yongquan, Lv, Haijun

arXiv.org Artificial Intelligence

Due to the dominant position of deep learning (mostly deep neural networks) in various artificial intelligence applications, recently, ensemble learning based on deep neural networks (ensemble deep learning) has shown significant performances in improving the generalization of learning system. However, since modern deep neural networks usually have millions to billions of parameters, the time and space overheads for training multiple base deep learners and testing with the ensemble deep learner are far greater than that of traditional ensemble learning. Though several algorithms of fast ensemble deep learning have been proposed to promote the deployment of ensemble deep learning in some applications, further advances still need to be made for many applications in specific fields, where the developing time and computing resources are usually restricted or the data to be processed is of large dimensionality. An urgent problem needs to be solved is how to take the significant advantages of ensemble deep learning while reduce the required time and space overheads so that many more applications in specific fields can benefit from it. For the alleviation of this problem, it is necessary to know about how ensemble learning has developed under the era of deep learning. Thus, in this article, we present discussion focusing on data analyses of published works, the methodology and unattainability of traditional ensemble learning, and recent developments of ensemble deep learning. We hope this article will be helpful to realize the technical challenges faced by future developments of ensemble learning under the era of deep learning.


When SIMPLE is better than complex: A case study on deep learning for predicting Bugzilla issue close time

Yedida, Rahul, Yang, Xueqi, Menzies, Tim

arXiv.org Artificial Intelligence

Is deep learning over-hyped? Where are the case studies that compare state-of-the-art deep learners with simpler options? In response to this gap in the literature, this paper offers one case study on using deep learning to predict issue close time in Bugzilla. We report here that a SIMPLE extension to a decades-old feedforward neural network works better than the more recent, and more elaborate, "long-short term memory" deep learning (which are currently popular in the SE literature). SIMPLE is a combination of a fast feedforward network and a hyper-parameter optimizer. SIMPLE runs in 3 seconds while the newer algorithms take 6 hours to terminate. Since it runs so fast, it is more amenable to being tuned by our optimizer. This paper reports results seen after running SIMPLE on issue close time data from 45,364 issues raised in Chromium, Eclipse, and Firefox projects from January 2010 to March 2016. In our experiments, this SIMPLEr tuning approach achieves significantly better predictors for issue close time than the more complex deep learner. These better and SIMPLEr results can be generated 2,700 times faster than if using a state-of-the-art deep learner. From this result, we make two conclusions. Firstly, for predicting issue close time, we would recommend SIMPLE over complex deep learners. Secondly, before analysts try very sophisticated (but very slow) algorithms, they might achieve better results, much sooner, by applying hyper-parameter optimization to simple (but very fast) algorithms.


Selecting Data Adaptive Learner from Multiple Deep Learners using Bayesian Networks

Kobayashi, Shusuke, Shirayama, Susumu

arXiv.org Artificial Intelligence

A method to predict time-series using multiple deep learners and a Bayesian network is proposed. In this study, the input explanatory variables are Bayesian network nodes that are associated with learners. Training data are divided using K-means clustering, and multiple deep learners are trained depending on the cluster. A Bayesian network is used to determine which deep learner is in charge of predicting a time-series. We determine a threshold value and select learners with a posterior probability equal to or greater than the threshold value, which could facilitate more robust prediction. The proposed method is applied to financial time-series data, and the predicted results for the Nikkei 225 index are demonstrated.


Uncertainty Quantification in Multimodal Ensembles of Deep Learners

Brown, Katherine E. (Tennessee Technological University ) | Bhuiyan, Farzana Ahamed (Tennessee Technological University) | Talbert, Douglas A. (Tennessee Technological University)

AAAI Conferences

Uncertainty quantification in deep learning is an active area of research that examines two primary types of uncertainty in deep learning: epistemic uncertainty and aleatoric uncertainty. Epistemic uncertainty is caused by not having enough data to adequately learn. This creates volatility in the parameters and predictions and causes uncertainty. High epistemic uncertainty can indicate that the model’s prediction is based on a pattern with which is it not familiar. Aleatoric uncertainty measures the uncertainty due to noise in the data. Two additional active areas of research are multimodal learning and malware analysis. Multimodal learning takes into consideration distinct expressions of features such as different representations (e.g., audio and visual data) or different sampling techniques. Multimodal learning has recently been used in malware analysis to combine multiple types of features. In this work, we present and analyze a novel technique to measure epistemic uncertainty from deep ensembles of modalities. Our results suggest that deep ensembles of modalities provide higher accuracy and lower uncertainty that the constituent single modalities and than the comparable hierarchical multimodal deep learner.


Why Small Data is Important for Advancing AI

#artificialintelligence

Everything was small data before we had big data. The scientific discoveries of the 19th and 20th centuries were all made using small data. Physicists made all calculations by hand, thus exclusively using small data. And yet, they discovered the most beautiful and most fundamental laws of nature. Moreover, they compressed them into simple rules in the form of elegant equations.


Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

Dixon, Matthew F., Polson, Nicholas G., Sokolov, Vadim O.

arXiv.org Machine Learning

Deep learning applies layers of hierarchical hidden variables to capture these interactions and nonlinearities. The theoretical roots lie in the Kolmogorov-Arnold representation theorem (Arnold, 1957; Kolmogorov, 1957) of multivariate functions, which states that any continuous multivariate function can be expressed as a superposition of continuous univariate semi-affine functions. This remarkable result has direct consequences for statistical modeling as a nonparametric pattern matching algorithm. Deep learning relies on pattern matching via its layers of univariate semi-affine functions and can be applied to both regression and classification problems. Deep learners provide a nonlinear predictor in complex settings where the input space can be very high dimensional.